The Recognition Algorithm of Two-Phase Flow Patterns Based on GoogLeNet+5 Coord Attention

The two-phase flow in a microchannel consists of liquid–liquid and gas–liquid material components. The automatic recognition of flow patterns using deep learning approaches has been emerging. This study aimed to improve the recognition accuracy of flow patterns in the two-phase flow images. The different convolutional kernels in the GoogLeNet algorithm extracted the image features with different scales. In order to strengthen the important channel and spatial features, this paper proposes the combined five-layer Coord attention and GoogLeNet algorithm to enhance the accuracy of the new algorithm. The optimized algorithm model was derived from image datasets with different liquid–liquid two-phase flows (NaAlg–Oil, GaInSn–Water), and its accuracy was 95.09% in training and 98.12% in testing. This new model was also applied to predict the flow patterns, with a recognition accuracy of more than 97% in both the liquid–liquid and gas–liquid two-phase flows (water–soybean oil, water–lubricating oil, and argon–water).


Introduction
A flow system composed of mutually immiscible two-phase substances (with at least one phase being a fluid) is called a two-phase flow. In two-phase flow experiments, the continuous phase and the dispersed phase enter the same channel from different channels, presenting different flow patterns. At present, flow patterns play an important role in the fields of biomedicine, material synthesis, and aerospace [1][2][3]. In the study of two-phase flow images, there are many types of flow pattern; for instance, slug, dripping, and jetting are the flow patterns used to generate monodispersed droplets. The traditional recognition of flow patterns mainly relied on visual observation. Direct visual recognition is effective in low-speed videos, but it is not suitable for high-speed videos. The structure of the twophase interface is complex, and the flow patterns may be converted in the video, causing some flow patterns to be misclassified. Considering the large number and the inconsistent quality of the flow pattern images, as well as the subjectivity of human observation, scholars have tried to use deep learning to identify flow patterns.
Convolutional neural networks (CNN)-one of the mainstream approaches in deep learning-can recognize and classify the flow patterns in research on two-phase flows. Many scholars have improved the feature extraction ability to obtain better network performance in CNN by increasing the network depth [4,5], enhancing the architecture of the convolution module [6][7][8], and adding new functional units [9]. The mobile network [10,11] and the shuffle network [12,13] increase the deep separable convolution to obtain algorithm models with lower computational cost and higher accuracy. The GoogLeNet algorithm [14] uses convolutional kernels of different sizes to extract the image features at different scales, enhancing the recognition and classification accuracy of the network model.
Adding an attention mechanism to a CNN can improve the recognition accuracy in the image identification. SENet [15] strengthens the important channel features and improves the recognition accuracy. CBAM [16], involving the spatial attention module in SENet, has better recognition and classification results. Coord [17] attention embeds the location information into the channel attention, enhancing the accuracy. LKA [18] embeds the selfattention mechanism into the large kernel convolution for extracting global information.
The research on the recognition algorithms of two-phase flow patterns has mainly focused on the architecture of network algorithms, the optimization of datasets, and the extension of algorithm models. VGG [19,20], ResNet [21,22], and GoogLeNet [23] are the most commonly used algorithms in flow pattern recognition due to their good performance. Some researchers have tried to establish algorithm models to recognize flow patterns using image datasets from different material components [24][25][26], synthetic algorithms [27], and data enhancement algorithms [28]. Nie, F. predicted the flow pattern of nitrogen-liquid nitrogen using a trained algorithm model that was extracted from the tetrafluoromethanemethane flow [23].
To further advance research in this field, this paper introduces an attention mechanism into the GoogLeNet algorithm, improving the recognition accuracy of flow patterns. The optimized model can predict the flow patterns of both liquid-liquid and gas-liquid two-phase flows.

Experiment and Image Dataset
Our two-phase flow experiments had two kinds of material components (NaAlg-oil, GaInSn-water), with two kinds of microchannels (convergent coaxial and vertical coaxial). The experimental parameters and images are listed in Tables 1 and 2. There were four flow patterns (slug, dripping, jetting, and others) in all experiments. The dispersed phases in the slug, dripping, and jetting flow patterns were in the shape of monodisperse droplets. The droplet length of the jetting pattern was obviously smaller than the microchannel width. The droplet length of the dripping pattern was 1.5 times smaller than the inner diameter of the microchannel. Meanwhile, the droplet length of the slug pattern was 1.5 times larger than the inner diameter of the microchannel. In addition to these three flow patterns, the other flow patterns in the two-phase flow were denoted as others.
A total of 24,860 images were collected from the raw videos of the experiments, captured by a high-speed camera. The collection rules for the image datasets were as follows: (1) 15 images with continuous frames were collected from each experiment video, (2) the size of each image was adjusted to be 224 × 224, and (3) similar features were included in each image, such as the channel architecture, the two-phase flows, etc.
The images in this paper had inconsistent sizes; however, the input image pixel in the deep learning algorithm was normalized to be 224 × 224. To match the requirements of the image pixel, the image size was transformed twice from the original pixels to 224 × 224 (Table 3).  GaInSn-water Vertical coaxial --GaInSn (7~108 mL/h) Water (36~900 mL/h) 132

Slug
Monodisperse droplets with a bullet-like or plunger-like shape, and the droplet length is 1.5 times larger than the inner diameter of the microchannel 1 3624 NaAlg

Slug
Monodisperse droplets with a bullet-like or plunger-like shape, and the droplet length is 1.5 times larger than the inner diameter of the microchannel 1 3624 1 3624 Dripping Monodisperse droplets with an ellipsoid or spherical shape, and the droplet length is 1.5 times smaller than the inner diameter of the microchannel Monodisperse droplets with an ellipsoid or spherical shape, and the droplet length is 1.5 times smaller than the inner diameter of the microchannel 1 10385 Jetting Monodisperse droplets with a nearly spherical shape, and the droplet length is significantly smaller than the inner diameter of the microchannel, with fast generation frequency and a stretching neck There were four flow patterns (slug, dripping, jetting, and others) in all experiments The dispersed phases in the slug, dripping, and jetting flow patterns were in the shape of monodisperse droplets. The droplet length of the jetting pattern was obviously smaller than the microchannel width. The droplet length of the dripping pattern was 1.5 times smaller than the inner diameter of the microchannel. Meanwhile, the droplet length of the slug pattern was 1.5 times larger than the inner diameter of the microchannel. In addition to these three flow patterns, the other flow patterns in the two-phase flow were denoted as others.
A total of 24,860 images were collected from the raw videos of the experiments, cap tured by a high-speed camera. The collection rules for the image datasets were as follows (1) 15 images with continuous frames were collected from each experiment video, (2) the size of each image was adjusted to be 224 × 224, and (3) similar features were included in each image, such as the channel architecture, the two-phase flows, etc.
The images in this paper had inconsistent sizes; however, the input image pixel in the deep learning algorithm was normalized to be 224 × 224. To match the requirements of the image pixel, the image size was transformed twice from the original pixels to 224 × 224 (Table 3).

Dripping
Monodisperse droplets with an ellipsoid or spherical shape, and the droplet length is 1.5 times smaller than the inner diameter of the microchannel 1 10385 Jetting Monodisperse droplets with a nearly spherical shape, and the droplet length is significantly smaller than the inner diameter of the microchannel, with fast generation frequency and a stretching neck There were four flow patterns (slug, dripping, jetting, and others) in all experiments. The dispersed phases in the slug, dripping, and jetting flow patterns were in the shape of monodisperse droplets. The droplet length of the jetting pattern was obviously smaller than the microchannel width. The droplet length of the dripping pattern was 1.5 times smaller than the inner diameter of the microchannel. Meanwhile, the droplet length of the slug pattern was 1.5 times larger than the inner diameter of the microchannel. In addition to these three flow patterns, the other flow patterns in the two-phase flow were denoted as others.
A total of 24,860 images were collected from the raw videos of the experiments, captured by a high-speed camera. The collection rules for the image datasets were as follows: (1) 15 images with continuous frames were collected from each experiment video, (2) the size of each image was adjusted to be 224 × 224, and (3) similar features were included in each image, such as the channel architecture, the two-phase flows, etc.
The images in this paper had inconsistent sizes; however, the input image pixel in the deep learning algorithm was normalized to be 224 × 224. To match the requirements of the image pixel, the image size was transformed twice from the original pixels to 224 × 224 (Table 3).

Dripping
Monodisperse droplets with an ellipsoid or spherical shape, and the droplet length is 1.5 times smaller than the inner diameter of the microchannel 1 10385 Jetting Monodisperse droplets with a nearly spherical shape, and the droplet length is significantly smaller than the inner diameter of the microchannel, with fast generation frequency and a stretching neck There were four flow patterns (slug, dripping, jetting, and others) in all experiments. The dispersed phases in the slug, dripping, and jetting flow patterns were in the shape of monodisperse droplets. The droplet length of the jetting pattern was obviously smaller than the microchannel width. The droplet length of the dripping pattern was 1.5 times smaller than the inner diameter of the microchannel. Meanwhile, the droplet length of the slug pattern was 1.5 times larger than the inner diameter of the microchannel. In addition to these three flow patterns, the other flow patterns in the two-phase flow were denoted as others.
A total of 24,860 images were collected from the raw videos of the experiments, captured by a high-speed camera. The collection rules for the image datasets were as follows: (1) 15 images with continuous frames were collected from each experiment video, (2) the size of each image was adjusted to be 224 × 224, and (3) similar features were included in each image, such as the channel architecture, the two-phase flows, etc.
The images in this paper had inconsistent sizes; however, the input image pixel in the deep learning algorithm was normalized to be 224 × 224. To match the requirements of the image pixel, the image size was transformed twice from the original pixels to 224 × 224 (Table 3). Table 3. Original pixels transformed into uniform pixels.

Dripping Jetting Others
Images of original pixels Images of uniform pixels 1 10,385 Jetting Monodisperse droplets with a nearly spherical shape, and the droplet length is significantly smaller than the inner diameter of the microchannel, with fast generation frequency and a stretching neck Monodisperse droplets with an ellipsoid or spherical shape, and the droplet length is 1.5 times smaller than the inner diameter of the microchannel 1 10385 Jetting Monodisperse droplets with a nearly spherical shape, and the droplet length is significantly smaller than the inner diameter of the microchannel, with fast generation frequency and a stretching neck There were four flow patterns (slug, dripping, jetting, and others) in all experiments The dispersed phases in the slug, dripping, and jetting flow patterns were in the shape o monodisperse droplets. The droplet length of the jetting pattern was obviously smaller than the microchannel width. The droplet length of the dripping pattern was 1.5 times smaller than the inner diameter of the microchannel. Meanwhile, the droplet length of the slug pattern was 1.5 times larger than the inner diameter of the microchannel. In addition to these three flow patterns, the other flow patterns in the two-phase flow were denoted as others.
A total of 24,860 images were collected from the raw videos of the experiments, cap tured by a high-speed camera. The collection rules for the image datasets were as follows (1) 15 images with continuous frames were collected from each experiment video, (2) the size of each image was adjusted to be 224 × 224, and (3) similar features were included in each image, such as the channel architecture, the two-phase flows, etc.
The images in this paper had inconsistent sizes; however, the input image pixel in the deep learning algorithm was normalized to be 224 × 224. To match the requirements of the image pixel, the image size was transformed twice from the original pixels to 224 × 224 (Table 3).

Dripping
Monodisperse droplets with an ellipsoid or spherical shape, and the droplet length is 1.5 times smaller than the inner diameter of the microchannel 1 10385 Jetting Monodisperse droplets with a nearly spherical shape, and the droplet length is significantly smaller than the inner diameter of the microchannel, with fast generation frequency and a stretching neck There were four flow patterns (slug, dripping, jetting, and others) in all experiments. The dispersed phases in the slug, dripping, and jetting flow patterns were in the shape of monodisperse droplets. The droplet length of the jetting pattern was obviously smaller than the microchannel width. The droplet length of the dripping pattern was 1.5 times smaller than the inner diameter of the microchannel. Meanwhile, the droplet length of the slug pattern was 1.5 times larger than the inner diameter of the microchannel. In addition to these three flow patterns, the other flow patterns in the two-phase flow were denoted as others.
A total of 24,860 images were collected from the raw videos of the experiments, captured by a high-speed camera. The collection rules for the image datasets were as follows: (1) 15 images with continuous frames were collected from each experiment video, (2) the size of each image was adjusted to be 224 × 224, and (3) similar features were included in each image, such as the channel architecture, the two-phase flows, etc.
The images in this paper had inconsistent sizes; however, the input image pixel in the deep learning algorithm was normalized to be 224 × 224. To match the requirements of the image pixel, the image size was transformed twice from the original pixels to 224 × 224 (Table 3).

Dripping
Monodisperse droplets with an ellipsoid or spherical shape, and the droplet length is 1.5 times smaller than the inner diameter of the microchannel 1 10385 Jetting Monodisperse droplets with a nearly spherical shape, and the droplet length is significantly smaller than the inner diameter of the microchannel, with fast generation frequency and a stretching neck There were four flow patterns (slug, dripping, jetting, and others) in all experiments. The dispersed phases in the slug, dripping, and jetting flow patterns were in the shape of monodisperse droplets. The droplet length of the jetting pattern was obviously smaller than the microchannel width. The droplet length of the dripping pattern was 1.5 times smaller than the inner diameter of the microchannel. Meanwhile, the droplet length of the slug pattern was 1.5 times larger than the inner diameter of the microchannel. In addition to these three flow patterns, the other flow patterns in the two-phase flow were denoted as others.
A total of 24,860 images were collected from the raw videos of the experiments, captured by a high-speed camera. The collection rules for the image datasets were as follows: (1) 15 images with continuous frames were collected from each experiment video, (2) the size of each image was adjusted to be 224 × 224, and (3) similar features were included in each image, such as the channel architecture, the two-phase flows, etc.
The images in this paper had inconsistent sizes; however, the input image pixel in the deep learning algorithm was normalized to be 224 × 224. To match the requirements of the image pixel, the image size was transformed twice from the original pixels to 224 × 224 (Table 3).

Dripping
Monodisperse droplets with an ellipsoid or spherical shape, and the droplet length is 1.5 times smaller than the inner diameter of the microchannel 1 10385 Jetting Monodisperse droplets with a nearly spherical shape, and the droplet length is significantly smaller than the inner diameter of the microchannel, with fast generation frequency and a stretching neck There were four flow patterns (slug, dripping, jetting, and others) in all experiments The dispersed phases in the slug, dripping, and jetting flow patterns were in the shape of monodisperse droplets. The droplet length of the jetting pattern was obviously smaller than the microchannel width. The droplet length of the dripping pattern was 1.5 times smaller than the inner diameter of the microchannel. Meanwhile, the droplet length of the slug pattern was 1.5 times larger than the inner diameter of the microchannel. In addition to these three flow patterns, the other flow patterns in the two-phase flow were denoted as others.
A total of 24,860 images were collected from the raw videos of the experiments, cap tured by a high-speed camera. The collection rules for the image datasets were as follows (1) 15 images with continuous frames were collected from each experiment video, (2) the size of each image was adjusted to be 224 × 224, and (3) similar features were included in each image, such as the channel architecture, the two-phase flows, etc.
The images in this paper had inconsistent sizes; however, the input image pixel in the deep learning algorithm was normalized to be 224 × 224. To match the requirements of the image pixel, the image size was transformed twice from the original pixels to 224 × 224 (Table 3).

Dripping
Monodisperse droplets with an ellipsoid or spherical shape, and the droplet length is 1.5 times smaller than the inner diameter of the microchannel 1 10385 Jetting Monodisperse droplets with a nearly spherical shape, and the droplet length is significantly smaller than the inner diameter of the microchannel, with fast generation frequency and a stretching neck There were four flow patterns (slug, dripping, jetting, and others) in all experiments. The dispersed phases in the slug, dripping, and jetting flow patterns were in the shape of monodisperse droplets. The droplet length of the jetting pattern was obviously smaller than the microchannel width. The droplet length of the dripping pattern was 1.5 times smaller than the inner diameter of the microchannel. Meanwhile, the droplet length of the slug pattern was 1.5 times larger than the inner diameter of the microchannel. In addition to these three flow patterns, the other flow patterns in the two-phase flow were denoted as others.
A total of 24,860 images were collected from the raw videos of the experiments, captured by a high-speed camera. The collection rules for the image datasets were as follows: (1) 15 images with continuous frames were collected from each experiment video, (2) the size of each image was adjusted to be 224 × 224, and (3) similar features were included in each image, such as the channel architecture, the two-phase flows, etc.
The images in this paper had inconsistent sizes; however, the input image pixel in the deep learning algorithm was normalized to be 224 × 224. To match the requirements of the image pixel, the image size was transformed twice from the original pixels to 224 × 224 (Table 3).

Dripping
Monodisperse droplets with an ellipsoid or spherical shape, and the droplet length is 1.5 times smaller than the inner diameter of the microchannel 1 10385 Jetting Monodisperse droplets with a nearly spherical shape, and the droplet length is significantly smaller than the inner diameter of the microchannel, with fast generation frequency and a stretching neck There were four flow patterns (slug, dripping, jetting, and others) in all experiments. The dispersed phases in the slug, dripping, and jetting flow patterns were in the shape of monodisperse droplets. The droplet length of the jetting pattern was obviously smaller than the microchannel width. The droplet length of the dripping pattern was 1.5 times smaller than the inner diameter of the microchannel. Meanwhile, the droplet length of the slug pattern was 1.5 times larger than the inner diameter of the microchannel. In addition to these three flow patterns, the other flow patterns in the two-phase flow were denoted as others.
A total of 24,860 images were collected from the raw videos of the experiments, captured by a high-speed camera. The collection rules for the image datasets were as follows: (1) 15 images with continuous frames were collected from each experiment video, (2) the size of each image was adjusted to be 224 × 224, and (3) similar features were included in each image, such as the channel architecture, the two-phase flows, etc.
The images in this paper had inconsistent sizes; however, the input image pixel in the deep learning algorithm was normalized to be 224 × 224. To match the requirements of the image pixel, the image size was transformed twice from the original pixels to 224 × 224 (Table 3). Table 3. Original pixels transformed into uniform pixels.

Slug Dripping Jetting Others
Images of original pixels Images of uniform pixels 1 4717 1 Represented blurred image. Table 3. Original pixels transformed into uniform pixels.

Slug Dripping Jetting Others
Images of original pixels than the inner diameter of the microchannel 1 Jetting Monodisperse droplets with a nearly spherical shape, and the droplet length is significantly smaller than the inner diameter of the microchannel, with fast generation frequency and a stretching neck 1 1 Others Other flow patterns There were four flow patterns (slug, dripping, jetting, and others) in all e The dispersed phases in the slug, dripping, and jetting flow patterns were in monodisperse droplets. The droplet length of the jetting pattern was obvio than the microchannel width. The droplet length of the dripping pattern w smaller than the inner diameter of the microchannel. Meanwhile, the droplet l slug pattern was 1.5 times larger than the inner diameter of the microchannel to these three flow patterns, the other flow patterns in the two-phase flow w as others.
A total of 24,860 images were collected from the raw videos of the exper tured by a high-speed camera. The collection rules for the image datasets wer (1) 15 images with continuous frames were collected from each experiment v size of each image was adjusted to be 224 × 224, and (3) similar features were each image, such as the channel architecture, the two-phase flows, etc.
The images in this paper had inconsistent sizes; however, the input im the deep learning algorithm was normalized to be 224 × 224. To match the r of the image pixel, the image size was transformed twice from the original pi 224 (Table 3). Table 3. Original pixels transformed into uniform pixels.

Slug
Dripping Jetting Oth

Images of original pixels
Images of uniform pixels In total, the slug pattern occupied 4.2% of the original image datasets. It w that the number of images with a slug pattern was too small to affect the than the inner diameter of the microchannel 1 Jetting Monodisperse droplets with a nearly spherical shape, and the droplet length is significantly smaller than the inner diameter of the microchannel, with fast generation frequency and a stretching neck 1  There were four flow patterns (slug, dripping, jetting, and others) in all experi The dispersed phases in the slug, dripping, and jetting flow patterns were in the sh monodisperse droplets. The droplet length of the jetting pattern was obviously s than the microchannel width. The droplet length of the dripping pattern was 1.5 smaller than the inner diameter of the microchannel. Meanwhile, the droplet length slug pattern was 1.5 times larger than the inner diameter of the microchannel. In ad to these three flow patterns, the other flow patterns in the two-phase flow were de as others.
A total of 24,860 images were collected from the raw videos of the experiment tured by a high-speed camera. The collection rules for the image datasets were as fo (1) 15 images with continuous frames were collected from each experiment video, size of each image was adjusted to be 224 × 224, and (3) similar features were inclu each image, such as the channel architecture, the two-phase flows, etc.
The images in this paper had inconsistent sizes; however, the input image p the deep learning algorithm was normalized to be 224 × 224. To match the require of the image pixel, the image size was transformed twice from the original pixels to 224 (Table 3). Table 3. Original pixels transformed into uniform pixels.

Slug
Dripping Jetting Others

Images of original pixels
Images of uniform pixels In total, the slug pattern occupied 4.2% of the original image datasets. It was ob that the number of images with a slug pattern was too small to affect the algor than the inner diameter of the microchannel 1 Jetting Monodisperse droplets with a nearly spherical shape, and the droplet length is significantly smaller than the inner diameter of the microchannel, with fast generation frequency and a stretching neck 1  There were four flow patterns (slug, dripping, jetting, and others) in all experiments The dispersed phases in the slug, dripping, and jetting flow patterns were in the shape o monodisperse droplets. The droplet length of the jetting pattern was obviously smaller than the microchannel width. The droplet length of the dripping pattern was 1.5 times smaller than the inner diameter of the microchannel. Meanwhile, the droplet length of the slug pattern was 1.5 times larger than the inner diameter of the microchannel. In addition to these three flow patterns, the other flow patterns in the two-phase flow were denoted as others.
A total of 24,860 images were collected from the raw videos of the experiments, cap tured by a high-speed camera. The collection rules for the image datasets were as follows (1) 15 images with continuous frames were collected from each experiment video, (2) the size of each image was adjusted to be 224 × 224, and (3) similar features were included in each image, such as the channel architecture, the two-phase flows, etc.
The images in this paper had inconsistent sizes; however, the input image pixel in the deep learning algorithm was normalized to be 224 × 224. To match the requirements of the image pixel, the image size was transformed twice from the original pixels to 224 × 224 (Table 3). Table 3. Original pixels transformed into uniform pixels.

Slug
Dripping Jetting Others

Images of original pixels
Images of uniform pixels In total, the slug pattern occupied 4.2% of the original image datasets. It was obvious that the number of images with a slug pattern was too small to affect the algorithm's than the inner diameter of the microchannel 1 Jetting Monodisperse droplets with a nearly spherical shape, and the droplet length is significantly smaller than the inner diameter of the microchannel, with fast generation frequency and a stretching neck 1  There were four flow patterns (slug, dripping, jetting, and others) in all experiments. The dispersed phases in the slug, dripping, and jetting flow patterns were in the shape of monodisperse droplets. The droplet length of the jetting pattern was obviously smaller than the microchannel width. The droplet length of the dripping pattern was 1.5 times smaller than the inner diameter of the microchannel. Meanwhile, the droplet length of the slug pattern was 1.5 times larger than the inner diameter of the microchannel. In addition to these three flow patterns, the other flow patterns in the two-phase flow were denoted as others.
A total of 24,860 images were collected from the raw videos of the experiments, captured by a high-speed camera. The collection rules for the image datasets were as follows: (1) 15 images with continuous frames were collected from each experiment video, (2) the size of each image was adjusted to be 224 × 224, and (3) similar features were included in each image, such as the channel architecture, the two-phase flows, etc.
The images in this paper had inconsistent sizes; however, the input image pixel in the deep learning algorithm was normalized to be 224 × 224. To match the requirements of the image pixel, the image size was transformed twice from the original pixels to 224 × 224 (Table 3). Table 3. Original pixels transformed into uniform pixels.

Slug
Dripping Jetting Others

Images of original pixels
Images of uniform pixels In total, the slug pattern occupied 4.2% of the original image datasets. It was obvious that the number of images with a slug pattern was too small to affect the algorithm's Images of uniform pixels than the inner diameter of the microchannel 1 Jetting Monodisperse droplets with a nearly spherical shape, and the droplet length is significantly smaller than the inner diameter of the microchannel, with fast generation frequency and a stretching neck 1 1 Others Other flow patterns There were four flow patterns (slug, dripping, jetting, and others) in all e The dispersed phases in the slug, dripping, and jetting flow patterns were in monodisperse droplets. The droplet length of the jetting pattern was obvio than the microchannel width. The droplet length of the dripping pattern w smaller than the inner diameter of the microchannel. Meanwhile, the droplet slug pattern was 1.5 times larger than the inner diameter of the microchannel to these three flow patterns, the other flow patterns in the two-phase flow w as others.
A total of 24,860 images were collected from the raw videos of the exper tured by a high-speed camera. The collection rules for the image datasets wer (1) 15 images with continuous frames were collected from each experiment v size of each image was adjusted to be 224 × 224, and (3) similar features were each image, such as the channel architecture, the two-phase flows, etc.
The images in this paper had inconsistent sizes; however, the input im the deep learning algorithm was normalized to be 224 × 224. To match the r of the image pixel, the image size was transformed twice from the original pi 224 (Table 3). Table 3. Original pixels transformed into uniform pixels.

Slug
Dripping Jetting Oth

Images of original pixels
Images of uniform pixels In total, the slug pattern occupied 4.2% of the original image datasets. It that the number of images with a slug pattern was too small to affect the than the inner diameter of the microchannel 1 Jetting Monodisperse droplets with a nearly spherical shape, and the droplet length is significantly smaller than the inner diameter of the microchannel, with fast generation frequency and a stretching neck 1  There were four flow patterns (slug, dripping, jetting, and others) in all experi The dispersed phases in the slug, dripping, and jetting flow patterns were in the sh monodisperse droplets. The droplet length of the jetting pattern was obviously s than the microchannel width. The droplet length of the dripping pattern was 1.5 smaller than the inner diameter of the microchannel. Meanwhile, the droplet length slug pattern was 1.5 times larger than the inner diameter of the microchannel. In ad to these three flow patterns, the other flow patterns in the two-phase flow were de as others.
A total of 24,860 images were collected from the raw videos of the experiment tured by a high-speed camera. The collection rules for the image datasets were as fo (1) 15 images with continuous frames were collected from each experiment video, size of each image was adjusted to be 224 × 224, and (3) similar features were inclu each image, such as the channel architecture, the two-phase flows, etc.
The images in this paper had inconsistent sizes; however, the input image p the deep learning algorithm was normalized to be 224 × 224. To match the require of the image pixel, the image size was transformed twice from the original pixels to 224 (Table 3). Table 3. Original pixels transformed into uniform pixels.

Slug
Dripping Jetting Others

Images of original pixels
Images of uniform pixels In total, the slug pattern occupied 4.2% of the original image datasets. It was o that the number of images with a slug pattern was too small to affect the algor than the inner diameter of the microchannel 1 Jetting Monodisperse droplets with a nearly spherical shape, and the droplet length is significantly smaller than the inner diameter of the microchannel, with fast generation frequency and a stretching neck 1  There were four flow patterns (slug, dripping, jetting, and others) in all experiments The dispersed phases in the slug, dripping, and jetting flow patterns were in the shape o monodisperse droplets. The droplet length of the jetting pattern was obviously smalle than the microchannel width. The droplet length of the dripping pattern was 1.5 time smaller than the inner diameter of the microchannel. Meanwhile, the droplet length of the slug pattern was 1.5 times larger than the inner diameter of the microchannel. In addition to these three flow patterns, the other flow patterns in the two-phase flow were denoted as others.
A total of 24,860 images were collected from the raw videos of the experiments, cap tured by a high-speed camera. The collection rules for the image datasets were as follows (1) 15 images with continuous frames were collected from each experiment video, (2) the size of each image was adjusted to be 224 × 224, and (3) similar features were included in each image, such as the channel architecture, the two-phase flows, etc.
The images in this paper had inconsistent sizes; however, the input image pixel in the deep learning algorithm was normalized to be 224 × 224. To match the requirement of the image pixel, the image size was transformed twice from the original pixels to 224 × 224 (Table 3). Table 3. Original pixels transformed into uniform pixels.

Slug
Dripping Jetting Others

Images of original pixels
Images of uniform pixels In total, the slug pattern occupied 4.2% of the original image datasets. It was obviou that the number of images with a slug pattern was too small to affect the algorithm' than the inner diameter of the microchannel 1 Jetting Monodisperse droplets with a nearly spherical shape, and the droplet length is significantly smaller than the inner diameter of the microchannel, with fast generation frequency and a stretching neck 1  There were four flow patterns (slug, dripping, jetting, and others) in all experiments. The dispersed phases in the slug, dripping, and jetting flow patterns were in the shape of monodisperse droplets. The droplet length of the jetting pattern was obviously smaller than the microchannel width. The droplet length of the dripping pattern was 1.5 times smaller than the inner diameter of the microchannel. Meanwhile, the droplet length of the slug pattern was 1.5 times larger than the inner diameter of the microchannel. In addition to these three flow patterns, the other flow patterns in the two-phase flow were denoted as others.
A total of 24,860 images were collected from the raw videos of the experiments, captured by a high-speed camera. The collection rules for the image datasets were as follows: (1) 15 images with continuous frames were collected from each experiment video, (2) the size of each image was adjusted to be 224 × 224, and (3) similar features were included in each image, such as the channel architecture, the two-phase flows, etc.
The images in this paper had inconsistent sizes; however, the input image pixel in the deep learning algorithm was normalized to be 224 × 224. To match the requirements of the image pixel, the image size was transformed twice from the original pixels to 224 × 224 (Table 3). Table 3. Original pixels transformed into uniform pixels.

Slug
Dripping Jetting Others

Images of original pixels
Images of uniform pixels In total, the slug pattern occupied 4.2% of the original image datasets. It was obvious that the number of images with a slug pattern was too small to affect the algorithm's In total, the slug pattern occupied 4.2% of the original image datasets. It was obvious that the number of images with a slug pattern was too small to affect the algorithm's accuracy compared to the other flow patterns. Thus, the data enhancement algorithm was introduced to increase the percentage of slug patterns to 16.9%. As a result, the percentages of the other three flow patterns (dripping, jetting, and others) were 46.66%, 18.97%, and 17.47%, respectively.

GoogLeNet Algorithm
In general, three-dimensional matrices describe the total image information in deep learning algorithms. The most popular algorithms adopt a matrix of 224 × 224 × 3 as the input parameter, in which 224 × 224 is the size of the image and 3 is the number of red, green, and blue channels. In the processer, many convolution and pooling layers reduce the size of channels (feature maps) and increase the number of channels simultaneously. As a result, the one-dimensional matrix is transformed from the three-dimensional matrices and is set to the softmax classifier, calculating the multi-classification output.
This paper studied the efficacy of four deep learning algorithms in the pattern recognition of two-phase flows.
GoogLeNet [14]: The GoogLeNet algorithm (Figure 1) is 22 layers in depth for parameters or 27 layers in depth for pooling counting. Its first four layers are the convolutional layers and the pooling layers. Beneath the algorithm layer, the GoogLeNet algorithm creates the inception module (Figure 1). For example, there are two layers of the inception module in the feature maps of 28 × 28 and 7 × 7, and five layers of the inception module in the feature map of 14 × 14. The inception module connects the multiple convolution kernels and the maximum pooling in parallel. It consists of three convolutions with different sizes and a maximum pooling layer (Figure 2). The calculation of the multiple convolution kernels increases explosively as the number of algorithm layers increases. The inception module introduces the 1 × 1 convolution to reduce the computational cost. The parallel 1 × 1, 3 × 3, and 5 × 5 convolutions can extract richer features in a layer, resulting in higher recognition accuracy. The inception module connects the multiple convolution kernels and the maximum pooling in parallel. It consists of three convolutions with different sizes and a maximum pooling layer (Figure 2). The calculation of the multiple convolution kernels increases explosively as the number of algorithm layers increases. The inception module introduces the 1 × 1 convolution to reduce the computational cost. The parallel 1 × 1, 3 × 3, and 5 × 5 convolutions can extract richer features in a layer, resulting in higher recognition accuracy.  VGG 16 [5]: The Visual Geometry Group Network (VGG16) has a total of 13 convo lutional layers and 3 fully connected layers. All of the convolution and pooling layer adopt 3 × 3 convolutional kernels and 2 × 2 pooling kernels.
ViT [29]: Vision Transformer (ViT) introduces the Transformer module. The input o the Transformer module is a one-dimensional matrix transformed from three-dimensiona matrices in the algorithm. The whole image being flattening into a one-dimensional ma trix addresses the issue of huge computational cost. ViT divides the image into 16 win dows and flattens it into a one-dimensional matrix to reduce the amount of calculation.
Swin-T [30]: Swin-Transformer (Swin-T) also applies the Transformer module. Un like ViT, which calculates the self-attention in the image, Swin-T calculates the Trans former in each small window first, and then merges the small windows into large win dows. After the three iterations of window merging and calculation, the output of th Transformer is sent into the fully connected layer and the softmax layer.
For comparison, VGG16, GoogLeNet, ViT, and Swin-T were trained to recognize th flow patterns after 50 iterations based on our image datasets in the liquid-liquid two phase flows (Figure 3). The green solid line represents VGG16, with a training accurac rate of 58.16%. The yellow solid line represents ViT, with a training accuracy rate o 86.71%. The blue solid line represents Swin-T, with a training accuracy rate of 91.74%. Th red solid line represents GoogLeNet, with a training accuracy rate of 94.83%. In all exper imental results, GoogLeNet showed the best recognition accuracy for the two-phase flow patterns. VGG 16 [5]: The Visual Geometry Group Network (VGG16) has a total of 13 convolutional layers and 3 fully connected layers. All of the convolution and pooling layers adopt 3 × 3 convolutional kernels and 2 × 2 pooling kernels.
ViT [29]: Vision Transformer (ViT) introduces the Transformer module. The input of the Transformer module is a one-dimensional matrix transformed from three-dimensional matrices in the algorithm. The whole image being flattening into a one-dimensional matrix addresses the issue of huge computational cost. ViT divides the image into 16 windows and flattens it into a one-dimensional matrix to reduce the amount of calculation.
Swin-T [30]: Swin-Transformer (Swin-T) also applies the Transformer module. Unlike ViT, which calculates the self-attention in the image, Swin-T calculates the Transformer in each small window first, and then merges the small windows into large windows. After the three iterations of window merging and calculation, the output of the Transformer is sent into the fully connected layer and the softmax layer.
For comparison, VGG16, GoogLeNet, ViT, and Swin-T were trained to recognize the flow patterns after 50 iterations based on our image datasets in the liquid-liquid two-phase flows (Figure 3). The green solid line represents VGG16, with a training accuracy rate of 58.16%. The yellow solid line represents ViT, with a training accuracy rate of 86.71%. The blue solid line represents Swin-T, with a training accuracy rate of 91.74%. The red solid line represents GoogLeNet, with a training accuracy rate of 94.83%. In all experimental results, GoogLeNet showed the best recognition accuracy for the two-phase flow patterns. phase flows (Figure 3). The green solid line represents VGG16, with a training accuracy rate of 58.16%. The yellow solid line represents ViT, with a training accuracy rate of 86.71%. The blue solid line represents Swin-T, with a training accuracy rate of 91.74%. The red solid line represents GoogLeNet, with a training accuracy rate of 94.83%. In all experimental results, GoogLeNet showed the best recognition accuracy for the two-phase flow patterns.

Coordinate Attention
In order to improve the recognition accuracy of GoogLeNet beyond 94.83%, this paper implanted an attention mechanism into the GoogLeNet algorithm.
The principle of an attention mechanism is to locate the interesting information and suppress the useless information by changing the weights of different areas. Attention has been widely used in computer vision and natural language processing.
The following four common attention were discussed: Coord: Coordinate attention (Coord) embeds the positional information into the channel attention, and strengthens the channel and spatial features (Figure 4).

Coordinate Attention
In order to improve the recognition accuracy of GoogLeNet beyond 94.83%, this paper implanted an attention mechanism into the GoogLeNet algorithm.
The principle of an attention mechanism is to locate the interesting information and suppress the useless information by changing the weights of different areas. Attention has been widely used in computer vision and natural language processing.
The following four common attention were discussed: Coord: Coordinate attention (Coord) embeds the positional information into the channel attention, and strengthens the channel and spatial features (Figure 4).

Conv2d Conv2d
Sigmoid Sigmoid As shown above, Coord attention implements two pooling kernels, (H, 1) or (1, W), to encode spatial information in the horizontal channel and the vertical channel, respectively. Equations (1) and (2) give the calculation equations: As shown above, Coord attention implements two pooling kernels, (H, 1) or (1, W), to encode spatial information in the horizontal channel and the vertical channel, respectively. Equations (1) and (2) give the calculation equations:

Re-weight
where x c is the input of the c-th dimension, Z(h) is the H-dimensional output of the c-th channel, and Z(w) is the W-dimensional output of the c-th channel. The contact layer and 1 × 1 convolution compress the channel in the spatial dimension and encode the spatial information in the vertical and horizontal directions through the batch-norm. The output is transformed into a pair of feature maps by the convolution transformation function. The convolution transformation function is where F is R C/r*(H + W) , and δ is the nonlinear activation function. Finally, the spatial features in the horizontal and vertical channels are calculated separately through 1 × 1 convolution, and then the features of the convolutions are put together to recalculate the weights. The formulae of output Y are where f h is R C/r*H , and f w is R C/r*W . Coord attention accounts for both the channel-to-channel relationships and the location information. It captures not only the information across channels, but also the directionaware and position-sensitive information, which can accurately locate and identify the target areas.
CBAM: The convolutional block attention module (CBAM) is divided into channel and spatial modules. The channel attention module compresses the feature map in the spatial dimension. The spatial attention module compresses the channel in the spatial dimension. CBAM sequentially generates an attention map in two independent dimensions (channel and space).
LKA: Large kernel attention (LKA) introduces self-attention into the large convolution kernels. The convolution of large kernel size is decomposed into the depthwise convolution, the depthwise empty convolution, and the channel convolution. LKA establishes the correlation of each point, generating the attention map in the large kernel convolution, which realizes the adaptability of the channel dimension and the spatial dimension.
SENet: Squeeze-and-Excitation Networks (SENet) are divided into squeeze and excitation modules. The squeeze module performs the feature compression in the spatial dimension, turning the 2D feature channel into a real number. The excitation module calculates the weights and correlations between each feature channel.

Optimized Algorithm Architecture
Four kinds of attention (Coord, CBAM, LKA, and SENet) were introduced into the GoogLeNet algorithm for the training and validation datasets. The innovative architectures possessed five layers of attention to be embedded in each map, with sizes of 224 × 224, 56 × 56, 28 × 28, 14 × 14, and 7 × 7. Figure 5 illustrates one architecture of the four attention, and Table 4 gives the results of their recognition accuracy. Compared with the three other attention, the accuracy of Coord attention was the best. This new GoogLeNet+5 Coord attention algorithm was proposed to recognize flow patterns.    GoogLeNet focuses on the local features extracted by the convolution kernels with different sizes; however, GoogLeNet is not good at global feature extraction. Coord attention can extract global features and also strengthen the important spatial and channel features. Coord attention compensates for the lack of global, spatial and channel features in GoogLeNet due to its computational reduction. Introducing Coord attention into the GoogLeNet algorithm improved the performance in recognizing the flow patterns, with a high accuracy and low loss in the training and validation steps.
In our two-phase flow images, the GoogLeNet algorithm easily extracted the local features, especially for the droplet features. Additionally, Coord attention was supplemented to extract the global features of images, such as the background and microchannel features. The innovative GoogLeNet+5 layers of the attention algorithm converged the local features of two-phase flows and the global features of the background and microchannels, which distinguished the background features clearly to recognize the flow patterns with a higher accuracy.

Training and Testing Results
The dataset was separated into model and test datasets at a ratio of 8:2, and the model dataset was also separated into training and validation datasets at a ratio of 8:2. For the training dataset, setting the batch size to 32, the learning rate to 0.0001, and the iteration number to 50, the recognition results of GoogLeNet and GoogLeNet+5 Coord are plotted together in Figure 6. GoogLeNet focuses on the local features extracted by the convolution kernels with different sizes; however, GoogLeNet is not good at global feature extraction. Coord attention can extract global features and also strengthen the important spatial and channel features. Coord attention compensates for the lack of global, spatial and channel features in GoogLeNet due to its computational reduction. Introducing Coord attention into the GoogLeNet algorithm improved the performance in recognizing the flow patterns, with a high accuracy and low loss in the training and validation steps.
In our two-phase flow images, the GoogLeNet algorithm easily extracted the local features, especially for the droplet features. Additionally, Coord attention was supplemented to extract the global features of images, such as the background and microchannel features. The innovative GoogLeNet+5 layers of the attention algorithm converged the local features of two-phase flows and the global features of the background and microchannels, which distinguished the background features clearly to recognize the flow patterns with a higher accuracy.

Training and Testing Results
The dataset was separated into model and test datasets at a ratio of 8:2, and the model dataset was also separated into training and validation datasets at a ratio of 8:2. For the training dataset, setting the batch size to 32, the learning rate to 0.0001, and the iteration number to 50, the recognition results of GoogLeNet and GoogLeNet+5 Coord are plotted together in Figure 6.
The blue solid curve represents the training results of GoogLeNet, with an accuracy of 94.83% and loss of 0.1245. The red solid curve represents the training results of Goog-LeNet+5 Coord, with an accuracy of 95.09% and loss of 0.1222. This indicates that the new algorithm, GoogLeNet+5 Coord, had a higher training accuracy and lower training loss than the traditional GoogLeNet algorithm-by 0.26% and −0.0023, respectively.    After the establishment of the algorithm model, the remaining 20% of the data were used in the testing dataset to test the model precision of GoogLeNet+5 Coord and Goog-LeNet ( Figure 8). For 50 iterations with the testing dataset, the average recognition accuracy of all images when applying GoogLeNet+5 Coord was 98.12%, which was about 0.29% higher than that of GoogLeNet.

Prediction Results
The optimized algorithm model with a higher accuracy (>98%) was derived from the image datasets with liquid-liquid two-phase flows (NaAlg-oil, GaInSn-water).
This paper extended the model of GoogLeNet+5 Coord to predict the flow patterns in the different gas-liquid and liquid-liquid two-phase components. Similarly, the datasets contained 600 images with the four flow patterns ( Table 5). The average accuracy was 97.65% in the prediction of gas-liquid flow patterns. After the establishment of the algorithm model, the remaining 20% of the data were used in the testing dataset to test the model precision of GoogLeNet+5 Coord and GoogLeNet (Figure 8). For 50 iterations with the testing dataset, the average recognition accuracy of all images when applying GoogLeNet+5 Coord was 98.12%, which was about 0.29% higher than that of GoogLeNet.
Coord, with an accuracy of 98.87% and loss of 0.03221. The new GoogLeNet+5 Coord had better validation results for accuracy (+0.2%) and loss (−0.00353) than the traditional Goog-LeNet algorithm. After the establishment of the algorithm model, the remaining 20% of the data were used in the testing dataset to test the model precision of GoogLeNet+5 Coord and Goog-LeNet ( Figure 8). For 50 iterations with the testing dataset, the average recognition accuracy of all images when applying GoogLeNet+5 Coord was 98.12%, which was about 0.29% higher than that of GoogLeNet.

Prediction Results
The optimized algorithm model with a higher accuracy (>98%) was derived from the image datasets with liquid-liquid two-phase flows (NaAlg-oil, GaInSn-water).
This paper extended the model of GoogLeNet+5 Coord to predict the flow patterns in the different gas-liquid and liquid-liquid two-phase components. Similarly, the datasets contained 600 images with the four flow patterns ( Table 5). The average accuracy was 97.65% in the prediction of gas-liquid flow patterns.

Prediction Results
The optimized algorithm model with a higher accuracy (>98%) was derived from the image datasets with liquid-liquid two-phase flows (NaAlg-oil, GaInSn-water).
This paper extended the model of GoogLeNet+5 Coord to predict the flow patterns in the different gas-liquid and liquid-liquid two-phase components. Similarly, the datasets contained 600 images with the four flow patterns ( Table 5). The average accuracy was 97.65% in the prediction of gas-liquid flow patterns.

Material Components
Dispersed Phase

Experimental Groups Images
Oil-water Vegetable oil Water 59 Oil-water Lubricating oil Deionized water 52 Argon-water Argon Water 93 Figure 9 depicts the prediction accuracy of the two algorithms in both liquid and gas-liquid two-phase flows. GoogLeNet+5 Coord had better perfo GoogLeNet. The new algorithm model could accurately identify the four fl with an average accuracy of 97.83%. GoogLeNet had poor accuracy in identif flow pattern, with an average accuracy of 84.25%. This shows that GoogLeN had higher accuracy and better consistency than GoogLeNet. In our two-phase flow images, the GoogLeNet algorithm easily extrac features, especially for the small droplet features. For example, for jetting, the volution of GoogLeNet was 7 × 7, which was approximate to the size of a dr fore, the prediction accuracy for jetting in GoogLeNet was higher than in G Coord. Moreover, the convolution of 7 × 7 was much smaller than the dropl slug flow pattern, and the recognition accuracy of the slug pattern in predicti Coord attention was supplemented to extract the global features of images  Argon-water Argon Water 93 Figure 9 depicts the prediction accuracy of the two algorithms in both the l liquid and gas-liquid two-phase flows. GoogLeNet+5 Coord had better performanc GoogLeNet. The new algorithm model could accurately identify the four flow pa with an average accuracy of 97.83%. GoogLeNet had poor accuracy in identifying th flow pattern, with an average accuracy of 84.25%. This shows that GoogLeNet+5 had higher accuracy and better consistency than GoogLeNet. In our two-phase flow images, the GoogLeNet algorithm easily extracted th features, especially for the small droplet features. For example, for jetting, the large volution of GoogLeNet was 7 × 7, which was approximate to the size of a droplet. fore, the prediction accuracy for jetting in GoogLeNet was higher than in GoogLe Coord. Moreover, the convolution of 7 × 7 was much smaller than the droplet size slug flow pattern, and the recognition accuracy of the slug pattern in prediction wa Coord attention was supplemented to extract the global features of images, such

Material Components
Dispersed Phase

Experimental Groups Images
Oil-water Vegetable oil Water 59 Oil-water Lubricating oil Deionized water 52 Argon-water Argon Water 93 Figure 9 depicts the prediction accuracy of the two algorithms in both the liquid liquid and gas-liquid two-phase flows. GoogLeNet+5 Coord had better performance than GoogLeNet. The new algorithm model could accurately identify the four flow patterns with an average accuracy of 97.83%. GoogLeNet had poor accuracy in identifying the slug flow pattern, with an average accuracy of 84.25%. This shows that GoogLeNet+5 Coord had higher accuracy and better consistency than GoogLeNet. In our two-phase flow images, the GoogLeNet algorithm easily extracted the loca features, especially for the small droplet features. For example, for jetting, the largest con volution of GoogLeNet was 7 × 7, which was approximate to the size of a droplet. There fore, the prediction accuracy for jetting in GoogLeNet was higher than in GoogLeNet+ Coord. Moreover, the convolution of 7 × 7 was much smaller than the droplet size of th slug flow pattern, and the recognition accuracy of the slug pattern in prediction was poor Coord attention was supplemented to extract the global features of images, such as th

Material Components
Dispersed Phase

Experimental Groups Images
Oil-water Vegetable oil Water 59 Oil-water Lubricating oil Deionized water 52 Argon-water Argon Water 93 Figure 9 depicts the prediction accuracy of the two algorithms in both the liquidliquid and gas-liquid two-phase flows. GoogLeNet+5 Coord had better performance than GoogLeNet. The new algorithm model could accurately identify the four flow patterns, with an average accuracy of 97.83%. GoogLeNet had poor accuracy in identifying the slug flow pattern, with an average accuracy of 84.25%. This shows that GoogLeNet+5 Coord had higher accuracy and better consistency than GoogLeNet. In our two-phase flow images, the GoogLeNet algorithm easily extracted the local features, especially for the small droplet features. For example, for jetting, the largest convolution of GoogLeNet was 7 × 7, which was approximate to the size of a droplet. Therefore, the prediction accuracy for jetting in GoogLeNet was higher than in GoogLeNet+5 Coord. Moreover, the convolution of 7 × 7 was much smaller than the droplet size of the slug flow pattern, and the recognition accuracy of the slug pattern in prediction was poor. Coord attention was supplemented to extract the global features of images, such as the

Oil-water
Lubricating oil Deionized water 52 Micromachines 2023, 14, x FOR PEER REVIEW Table 5. Experiments for flow pattern prediction.

Material Components
Dispersed Phase

Experimental Groups Images
Oil-water Vegetable oil Water 59 Oil-water Lubricating oil Deionized water 52 Argon-water Argon Water 93 Figure 9 depicts the prediction accuracy of the two algorithms in both th liquid and gas-liquid two-phase flows. GoogLeNet+5 Coord had better performa GoogLeNet. The new algorithm model could accurately identify the four flow with an average accuracy of 97.83%. GoogLeNet had poor accuracy in identifying flow pattern, with an average accuracy of 84.25%. This shows that GoogLeNet had higher accuracy and better consistency than GoogLeNet. In our two-phase flow images, the GoogLeNet algorithm easily extracted features, especially for the small droplet features. For example, for jetting, the lar volution of GoogLeNet was 7 × 7, which was approximate to the size of a drople fore, the prediction accuracy for jetting in GoogLeNet was higher than in Goog Coord. Moreover, the convolution of 7 × 7 was much smaller than the droplet s slug flow pattern, and the recognition accuracy of the slug pattern in prediction w Coord attention was supplemented to extract the global features of images, su  Argon-water Argon Water 93 Figure 9 depicts the prediction accuracy of the two algorithms in both the liqu liquid and gas-liquid two-phase flows. GoogLeNet+5 Coord had better performance th GoogLeNet. The new algorithm model could accurately identify the four flow patter with an average accuracy of 97.83%. GoogLeNet had poor accuracy in identifying the s flow pattern, with an average accuracy of 84.25%. This shows that GoogLeNet+5 Co had higher accuracy and better consistency than GoogLeNet. In our two-phase flow images, the GoogLeNet algorithm easily extracted the lo features, especially for the small droplet features. For example, for jetting, the largest c volution of GoogLeNet was 7 × 7, which was approximate to the size of a droplet. The fore, the prediction accuracy for jetting in GoogLeNet was higher than in GoogLeNe Coord. Moreover, the convolution of 7 × 7 was much smaller than the droplet size of slug flow pattern, and the recognition accuracy of the slug pattern in prediction was po Coord attention was supplemented to extract the global features of images, such as  Argon-water Argon Water 93 Figure 9 depicts the prediction accuracy of the two algorithms in both the liquidliquid and gas-liquid two-phase flows. GoogLeNet+5 Coord had better performance than GoogLeNet. The new algorithm model could accurately identify the four flow patterns, with an average accuracy of 97.83%. GoogLeNet had poor accuracy in identifying the slug flow pattern, with an average accuracy of 84.25%. This shows that GoogLeNet+5 Coord had higher accuracy and better consistency than GoogLeNet. In our two-phase flow images, the GoogLeNet algorithm easily extracted the local features, especially for the small droplet features. For example, for jetting, the largest convolution of GoogLeNet was 7 × 7, which was approximate to the size of a droplet. Therefore, the prediction accuracy for jetting in GoogLeNet was higher than in GoogLeNet+5 Coord. Moreover, the convolution of 7 × 7 was much smaller than the droplet size of the slug flow pattern, and the recognition accuracy of the slug pattern in prediction was poor. Coord attention was supplemented to extract the global features of images, such as the  Argon-water Argon Water 93 Figure 9 depicts the prediction accuracy of the two algorithms in both liquid and gas-liquid two-phase flows. GoogLeNet+5 Coord had better perfo GoogLeNet. The new algorithm model could accurately identify the four flo with an average accuracy of 97.83%. GoogLeNet had poor accuracy in identify flow pattern, with an average accuracy of 84.25%. This shows that GoogLeN had higher accuracy and better consistency than GoogLeNet. In our two-phase flow images, the GoogLeNet algorithm easily extract features, especially for the small droplet features. For example, for jetting, the volution of GoogLeNet was 7 × 7, which was approximate to the size of a dro fore, the prediction accuracy for jetting in GoogLeNet was higher than in G Coord. Moreover, the convolution of 7 × 7 was much smaller than the drople slug flow pattern, and the recognition accuracy of the slug pattern in predictio Coord attention was supplemented to extract the global features of images,  Argon-water Argon Water 93 Figure 9 depicts the prediction accuracy of the two algorithms in both the l liquid and gas-liquid two-phase flows. GoogLeNet+5 Coord had better performanc GoogLeNet. The new algorithm model could accurately identify the four flow pa with an average accuracy of 97.83%. GoogLeNet had poor accuracy in identifying th flow pattern, with an average accuracy of 84.25%. This shows that GoogLeNet+5 had higher accuracy and better consistency than GoogLeNet. In our two-phase flow images, the GoogLeNet algorithm easily extracted th features, especially for the small droplet features. For example, for jetting, the large volution of GoogLeNet was 7 × 7, which was approximate to the size of a droplet. fore, the prediction accuracy for jetting in GoogLeNet was higher than in GoogLe Coord. Moreover, the convolution of 7 × 7 was much smaller than the droplet size slug flow pattern, and the recognition accuracy of the slug pattern in prediction was Coord attention was supplemented to extract the global features of images, such  Argon-water Argon Water 93 Figure 9 depicts the prediction accuracy of the two algorithms in both the liquidliquid and gas-liquid two-phase flows. GoogLeNet+5 Coord had better performance than GoogLeNet. The new algorithm model could accurately identify the four flow patterns with an average accuracy of 97.83%. GoogLeNet had poor accuracy in identifying the slug flow pattern, with an average accuracy of 84.25%. This shows that GoogLeNet+5 Coord had higher accuracy and better consistency than GoogLeNet. In our two-phase flow images, the GoogLeNet algorithm easily extracted the loca features, especially for the small droplet features. For example, for jetting, the largest convolution of GoogLeNet was 7 × 7, which was approximate to the size of a droplet. Therefore, the prediction accuracy for jetting in GoogLeNet was higher than in GoogLeNet+5 Coord. Moreover, the convolution of 7 × 7 was much smaller than the droplet size of the slug flow pattern, and the recognition accuracy of the slug pattern in prediction was poor Coord attention was supplemented to extract the global features of images, such as the

Material Components
Dispersed Phase

Experimental Groups Images
Oil-water Vegetable oil Water 59 Oil-water Lubricating oil Deionized water 52 Argon-water Argon Water 93 Figure 9 depicts the prediction accuracy of the two algorithms in both the liquidliquid and gas-liquid two-phase flows. GoogLeNet+5 Coord had better performance than GoogLeNet. The new algorithm model could accurately identify the four flow patterns, with an average accuracy of 97.83%. GoogLeNet had poor accuracy in identifying the slug flow pattern, with an average accuracy of 84.25%. This shows that GoogLeNet+5 Coord had higher accuracy and better consistency than GoogLeNet. In our two-phase flow images, the GoogLeNet algorithm easily extracted the local features, especially for the small droplet features. For example, for jetting, the largest convolution of GoogLeNet was 7 × 7, which was approximate to the size of a droplet. Therefore, the prediction accuracy for jetting in GoogLeNet was higher than in GoogLeNet+5 Coord. Moreover, the convolution of 7 × 7 was much smaller than the droplet size of the slug flow pattern, and the recognition accuracy of the slug pattern in prediction was poor. Coord attention was supplemented to extract the global features of images, such as the Figure 9 depicts the prediction accuracy of the two algorithms in both the liquid-liquid and gas-liquid two-phase flows. GoogLeNet+5 Coord had better performance than GoogLeNet. The new algorithm model could accurately identify the four flow patterns, with an average accuracy of 97.83%. GoogLeNet had poor accuracy in identifying the slug flow pattern, with an average accuracy of 84.25%. This shows that GoogLeNet+5 Coord had higher accuracy and better consistency than GoogLeNet.

Material Components
Dispersed Phase

Experimental Groups Images
Oil-water Vegetable oil Water 59 Oil-water Lubricating oil Deionized water 52 Argon-water Argon Water 93 Figure 9 depicts the prediction accuracy of the two algorithms in both the liquidliquid and gas-liquid two-phase flows. GoogLeNet+5 Coord had better performance than GoogLeNet. The new algorithm model could accurately identify the four flow patterns, with an average accuracy of 97.83%. GoogLeNet had poor accuracy in identifying the slug flow pattern, with an average accuracy of 84.25%. This shows that GoogLeNet+5 Coord had higher accuracy and better consistency than GoogLeNet. In our two-phase flow images, the GoogLeNet algorithm easily extracted the local features, especially for the small droplet features. For example, for jetting, the largest convolution of GoogLeNet was 7 × 7, which was approximate to the size of a droplet. Therefore, the prediction accuracy for jetting in GoogLeNet was higher than in GoogLeNet+5 Coord. Moreover, the convolution of 7 × 7 was much smaller than the droplet size of the slug flow pattern, and the recognition accuracy of the slug pattern in prediction was poor. Coord attention was supplemented to extract the global features of images, such as the In our two-phase flow images, the GoogLeNet algorithm easily extracted the local features, especially for the small droplet features. For example, for jetting, the largest convolution of GoogLeNet was 7 × 7, which was approximate to the size of a droplet. Therefore, the prediction accuracy for jetting in GoogLeNet was higher than in GoogLeNet+5 Coord. Moreover, the convolution of 7 × 7 was much smaller than the droplet size of the slug flow pattern, and the recognition accuracy of the slug pattern in prediction was poor. Coord attention was supplemented to extract the global features of images, such as the background features, large droplet features, microchannel features, etc. The innovative GoogLeNet + 5 layers of attention algorithm converged the local features of the two-phase flows and the global features of the large droplets, the background, and the microchannels to accurately predict the four flow patterns ( Figure 10). background features, large droplet features, microchannel features, etc. The innovative GoogLeNet + 5 layers of attention algorithm converged the local features of the two-phase flows and the global features of the large droplets, the background, and the microchannels to accurately predict the four flow patterns ( Figure 10).

Conclusions
This paper researched flow pattern recognition from image datasets of gas-liquid and liquid-liquid two-phase flows.
1. Compared with the VGG16, ViT, and Swin-T algorithms, the GoogLeNet algorithm had a higher accuracy in recognizing and classifying flow patterns.
2. Different attention were introduced to improve the recognition accuracy, and it was found that the optimal algorithm was GoogLeNet+5 Coord, which strengthened the important channel and spatial features and extracted the two-phase flow features simultaneously.
3. The optimized GoogLeNet+5 Coord algorithm was trained from the data of different liquid-liquid two-phase flows, and it could predict the liquid-liquid and gas-liquid two-phase flow patterns with a high accuracy of more than 97%. 4. The optimized algorithm model was a normalized model for flow pattern recognition in both liquid-liquid and gas-liquid two-phase flows.